Name: __ Class: __ Date: __

AP Logo

Team Seas - River Pollution

AIM - To analyse plastic discharge by rivers using data from the Ocean Cleanup Project

Difficulty: Easy

Start:

This activity is inspired by the work of the Ocean Cleanup Project and #TeamSeas started by YouTubers MrBeast and Mark Rober. It uses data from Meijer L. J. J. et al., 2021, Sci. Adv., More than 1000 rivers account for 80% of global riverine plastic emissions into the ocean, DOI: 10.1126/sciadv.aaz5803 which is provided here and has been downloaded into the Meijer2021_midpoint_emissions directory for this activity.

Explore the Ocean Cleanup Projects' interactive map version of this data on their website here.

You can see the full #TeamSeas campaign to remove plastic from the Ocean here (I am in no way affiliated with any campaigns or organisations listed above).

Contents

Let's Go:

We are going to explore this river pollution dataset using GeoPandas. GeoPandas is built on Pandas and allows us to work with geospatial data.

You will need to have GeoPandas, contextily, matplotlib + ipywidgets, and adjustText installed to run all the code.

We can use GeoPandas to read in the special shapefile which contains our geospatial data. Go ahead and run the next code cell.

Let's look and see how many rivers are in the dataset using .shape.

There are 31,819 rivers included in this dataset! Let's look at the first 5 rows using .head().

The dots_exten column tells us the total annual plastic emissions in metric tons and the geometry column contains the POINTs showing the locations of all the rivers.

It would be good to know the maximum and minimum values of dots_exten. Use the next two code cells to print out the min and max values. Treat rivers as a normal Pandas DataFrame.

At least one river in the dataset has no or close to $0\ T$ of plastic pollution.
On the other hand the max polluting river is emitting $62,591.9\ T$ of plastic each year!

Before we move on we should know which Coordinate Reference System (CRS) the data is stored in. Run the code below.

The CRS is WGS 84 which is the latitude longitude projection. More info on reference systems can be found here.

Exploring with GeoPandas

There are two easy tools which we can use to visualise this dataset. The first is calling .plot() on our GeoPandas dataset. This will plot all the river points using Matplotlib.

Run the code below to see the figure.

This is great to quickly visualise the data but it looks terrible and has no scale/colourbar!

We can use .explore() to create an interactive figure of our data.
This figure may be slow to repond to hover/panning etc. since the dataset is so large!

If you feel later on that the figure above is slowing the notebook down restart the kernel and skip the cell above when re-running the previous code.

Plotting 100 Worst Pollutors

We are going to build our own plot where we can highlight certain rivers.

Let's say we want the 100 largest polluting rivers. Complete the code below to select those rivers.

Now let's see what percentage of the total plastic emission is from just those rivers.

Wow such a large percentage from so few rivers. Let's plot these on a new figure alongside all rivers.

The following code assigns sizes to river points based off their pollution values.

We can add a continent basemap under our data by loading in the natural Earth dataset from GeoPandas.

Now all we have to do is set up our plot. Complete the code below to also plot the big_rivers data just like is shown for rivers.

If you're happy with this figure save it using the code cell below.
Change the name to something meaningful!

Pollution by Country

So far we have plotted all the rivers and the biggest 100. What about the countries that are the biggest polluters?

To get this data and plot it we will have to merge our rivers DataFrame with the world DataFrame which contains contry info.

Use the sjoin(df1, df2, how='right') function to join these two datasets into a new river_country DataFrame below:

Now each river has an associated country in the 'name' column. We can use the .grouby() function to keep only the 'name', 'continent', 'pop_est', 'gdp_md_est' columns and sum over 'dots_exten'. This will gives us the total pollution for each country.

This country_plastics DataFrame has data from the original World DataFrame and a value for the total plastic emissions in metric tons in the dots_exten column.

The only thing we are missing is the geometry values for each country so we can plot them.
Select only the 'name' and 'geometry' columns from world and store them in country_geometry.

All that's left is to join the country_plastics and country_geometry DataFrames on the 'name' column.

Before we move on check that the population and gdp data hasn't been altered! You might like to play around with it later.

Print out the row in world for Albania and check against country_plastics above. If all is good they should contain the same population and gdp values.

Now we can plot a chloropleth map which will colour each country based on their pollution. Run the code below to generate the plot.

This plot makes it look like only 4-5 countries are polluting!

This is obviously not true. Most countries data lies well below $50,000\ T$ so a few outliers are distorting our visualisation with this colourscheme. Let's print the 10 worst offenders.

We can change how our data is coloured by specifying the scheme. Let's see if using quantiles to colour our data helps.

Better but still not great - the largest quantile has collated loads of countries with a huge range of pollution values. Before we try moving to other schemes save this figure for reference.

Remember to chose a sensible name.

A list of schemes which alter how the data is binned and coloured can be found here. Change the scheme to see if there is a better one to represent our data.

Tricky! One of the challenges at the end of this acitivity is to break up the plot so high pollutors are plotted separately with a different colourscheme to low polluting countries.

Local River Data

Let's now look at some rivers local to you or your country.

We can split up the geometry column in rivers to create columns for longitude and latitude with the code below.

For the South / South East UK I know I roughly need latitude values between $50^{\circ}$ and $52^{\circ}$ and longitude values between $-2^{\circ}$ and $2^{\circ}$.

Run the following code to select rivers in this part of the UK. You will be able to enter your own region later.

Let's see how many rivers we are left with:

Now we can plot the rivers.

Your Local Rivers

Before we move on head over to OpenStreetMap. Navigate to an area of coastline you would like to focus on.

Drag the Green marker to the top right of the area. Drag the Red marker to the bottom left of the area. This will display the lat, lon values in the top left directions box.

Go ahead and:

This code wraps up what we did for the South East UK into a function that will accept any latitude/long pairs. Enter your latitude/longitude pairs into the code cell below following this format:

local_rivers(lat_min, lat_max, lon_min, lon_max, conf)

Then run the code and save your figure if you're happy.

Each river has a handy label! The local_config dictionary allows you to control things like the marker colour, scale the markers, and change the background opacity. Try out the config below with your latitude/longitudes. What does it do?

You can now alter the config as much as you like using the function docstring as guidance.

Have a go at doing some other areas you're interested in. Do you know any of the rivers whose outlets are plotted?

Saving your Work

Always save your work at the end and download the .ipynb and other files you need to keep!

Over to You

Have a go at the tasks below. I haven't provided solutions and there are probably a few ways to solve each one! Remember if you are stuck you can consult the Python and NumPy docs at https://docs.python.org/3/, and https://numpy.org/doc/stable/, find help from one of these websites www.w3schools.com, www.python.org, www.learnpython.org or you can also get advice from www.stackoverflow.com. You can also ask your instructor for help or email me at astrodimitrios@gmail.com.

Task 1: Go back to the custom plot where we added all the rivers. Alter the plot to show labels for the largest polluting rivers. You can also try highlighting different rivers based off percentiles.
Task 2: Alter the chloropleth plot so that the largest 10 countries are plotted separetly with a different colourscheme to the rest of the countries.
Task 3: Create a new local river plot and annotate the names of some river outlets on the figure.
Task 4: Check out the #TeamSeas campaign and the Ocean Cleanup Project by clicking on the links at the start.

References

Data from here Meijer L. J. J. et al., 2021, Sci. Adv., More than 1000 rivers account for 80% of global riverine plastic emissions into the ocean, DOI: 10.1126/sciadv.aaz5803 which is provided here

Acknowledgements

Thanks to Lourens Meijer from the Ocean Cleanup for clarifying my question on the geospatial dataset.

Sharing

If you share, use or modify this activity in any way use the citation in this txt file.
Please contact me at astrodimitrios@gmail.com with any suggestions, mistakes found, or general questions about teaching astronomy with Python.

© Dimitrios Theodorakis GNU General Public License v3.0 https://github.com/astroDimitrios/Astronomy